200 research outputs found

    Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation

    Full text link
    This paper demonstrates that word sense disambiguation (WSD) can improve neural machine translation (NMT) by widening the source context considered when modeling the senses of potentially ambiguous words. We first introduce three adaptive clustering algorithms for WSD, based on k-means, Chinese restaurant processes, and random walks, which are then applied to large word contexts represented in a low-rank space and evaluated on SemEval shared-task data. We then learn word vectors jointly with sense vectors defined by our best WSD method, within a state-of-the-art NMT system. We show that the concatenation of these vectors, and the use of a sense selection mechanism based on the weighted average of sense vectors, outperforms several baselines including sense-aware ones. This is demonstrated by translation on five language pairs. The improvements are above one BLEU point over strong NMT baselines, +4% accuracy over all ambiguous nouns and verbs, or +20% when scored manually over several challenging words.Comment: To appear in TAC

    Reference-based vs. task-based evaluation of human language technology

    Get PDF
    This paper starts from the ISO distinction of three types of evaluation procedures – internal, external and in use – and proposes to match these types to the three types of human language technology (HLT) systems: analysis, generation, and interactive. The paper explains why internal evaluation is not suitable to measure the qualities of HLT systems, and shows that reference-based external evaluation is best adapted to ‘analysis’ systems, task-based evaluation to ‘interactive’ systems, while ‘generation’ systems can be subject to both types of evaluation. In particular, some limits of reference-based external evaluation are shown in the case of generation systems. Finally, the paper shows that contextual evaluation, as illustrated by the FEMTI framework for MT evaluation, is an effective method for getting reference-based evaluation closer to the users of a system

    Comparing meeting browsers using a task-based evaluation method

    Get PDF
    Information access within meeting recordings, potentially transcribed and augmented with other media, is facilitated by the use of meeting browsers. To evaluate their performance through a shared benchmark task, users are asked to discriminate between true and false parallel statements about facts in meetings, using different browsers. This paper offers a review of the results obtained so far with five types of meeting browsers, using similar sets of statements over the same meeting recordings. The results indicate that state-of-the-art speed for true/false question answering is 1.5-2 minutes per question, and precision is 70%-80% (vs. 50% random guess). The use of ASR compared to manual transcripts, or the use of audio signals only, lead to a perceptible though not dramatic decrease in performance scores

    Dimensionality of Dialogue Act Tagsets: An Empirical Analysis of Large Corpora

    Get PDF
    This article compares one-dimensional and multi-dimensional dialogue act tagsets used for automatic labeling of utterances. The influence of tagset dimensionality on tagging accuracy is first discussed theoretically, then based on empirical data from human and automatic annotations of large scale resources, using four existing tagsets: DAMSL, SWBD-DAMSL, ICSI-MRDA and MALTUS. The Dominant Function Approximation proposes that automatic dialogue act taggers could focus initially on finding the main dialogue function of each utterance, which is empirically acceptable and has significant practical relevance

    Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German

    Full text link
    The goal of this work is to design a machine translation (MT) system for a low-resource family of dialects, collectively known as Swiss German, which are widely spoken in Switzerland but seldom written. We collected a significant number of parallel written resources to start with, up to a total of about 60k words. Moreover, we identified several other promising data sources for Swiss German. Then, we designed and compared three strategies for normalizing Swiss German input in order to address the regional diversity. We found that character-based neural MT was the best solution for text normalization. In combination with phrase-based statistical MT, our solution reached 36% BLEU score when translating from the Bernese dialect. This value, however, decreases as the testing data becomes more remote from the training one, geographically and topically. These resources and normalization techniques are a first step towards full MT of Swiss German dialects.Comment: 11th Language Resources and Evaluation Conference (LREC), 7-12 May 2018, Miyazaki (Japan
    • 

    corecore